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ABSTRACT 

Wi-Fi is a popular technology that allows an electronic device to exchange data or connect to the internet wirelessly using radio 
waves, Wi-Fi signals are typically information carriers between a transmitter and a receiver. Similar to the same concept of Wi-Fi, 
Wi-Vi (WI-FI VISION) is a new technology that enables seeing through walls using Wi-Fi signals. It allows us to track moving humans 
through walls and behind closed doors. Wi-Vi relies on capturing the reflections of its own transmitted signals off moving objects 
behind a wall in order to track them. Wi-Vi's operation does not require access to any device on the other side of the wall. We show 
that Wi-Fi can also extend our senses, enabling us to see moving objects through walls and behind closed doors. In particular, we 
can use such signals to identify the number of people in a closed room and their relative locations. We can also identify simple 
gestures made behind a wall, and combine a sequence of gestures to communicate messages to a wireless receiver without 
carrying any transmitting device. The paper introduces two main innovations. First, it shows how one can use MIMO interference 
nulling to eliminate reflections off static objects and focus the receiver on a moving target. Second, it shows how one can track a 
human by treating the motion of a human body as an antenna array and tracking the resulting RF beam. 
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1. INTRODUCTION 


Can Wi-Fi signals enable us to see through walls? For many years humans have fantasized about X-ray vision and 


played with the concept in comic books and sci-fi movies. This paper explores the potential of using Wi-Fi signals and 
recent advances in MIMO communications to build a device that can capture the motion of humans behind a wall and 
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in closed rooms. Law enforcement personnel can use the device to avoid walking into an ambush, and minimize 
casualties in standoffs and hostage situations. Emergency responders can use it to see through rubble and collapsed 
structures. 

Ordinary users can leverage the device for gaming, intrusion detection, privacy-enhanced monitoring of children 
and elderly or personal security when stepping into dark alleys and unknown places. The concept underlying seeing 
through opaque obstacles is similar to radar and sonar imaging. Specifically, when faced with a non-metallic wall, a 
fraction of the RF signal would traverse the wall, reflect off objects and humans, and come back imprinted with a 
signature of what is inside a closed room. By capturing these reflections, we can image objects behind a wall. Building 
a device that can capture such reflections, however, is difficult because the signal power after traversing the wall 
twice (in and out of the room) is reduced by three to five orders of magnitude. Even more challenging are the 
reflections from the wall itself, which are much stronger than the reflections from objects inside the room. 
Reflections off the wall overwhelm the receiver’s analog to digital converter (ADC), preventing it from registering the 
minute variations due to reflections from objects behind the wall. This behavior is called the “Flash Effect" since it is 
analogous to how a mirror in front of a camera reflects the camera’s flash and prevents it from capturing objects in 
the scene. 

So how can one overcome these difficulties? The radar community has been investigating these issues, and has 
recently introduced a few ultra-wideband systems that can detect humans moving behind a wall, and show them as 
blobs moving in a dim background. Today’s state-of-the-art system requires 2 GHz of bandwidth, a large power 
source, and an 8-foot long antenna array (2.4 meters). Apart from the bulkiness of the device, blasting power in such 
a wide spectrum is infeasible for entities other than the military. The requirement for multi-GHz transmission is at the 
heart of how these systems work: they separate reflections off the wall from reflections from the objects behind the 
wall based on their arrival time, and hence need to identify sub-nanosecond delays (i.e., multi-GHz bandwidth) to 
filter the flash effect.1 To address these limitations, an initial attempt was made in 2012 to use Wi-Fi to see through a 
wall by K. Chetty, G. Smith, and K. Woodbridge et al (2012). However, to mitigate the flash effect, this past proposal 
needs to install an additional receiver behind the wall, and connect the receivers behind and in front of the wall to a 
joint clock via wires. The objective of this paper is to enable a see-through-wall technology that is low-bandwidth, 
low-power, compact, and accessible to non-military entities. To this end, the paper introduces Wi-Vi, 2 a see-through- 
wall device that employs Wi-Fi signals in the 2.4 GHz ISM band. Wi-Vi limits itself to a 20 MHz-wide Wi-Fi channel, and 
avoids ultra-wideband solutions used today to address the flash effect. It also disposes of the large antenna array, 
typical in past systems, and uses instead a smaller 3-antenna MIMO radio. So, how does Wi-Vi eliminate the flash 
effect without using GHz of bandwidth? We observe that we can adapt recent advances in MIMO communications to 
through-wall imaging. In MIMO, multiple antenna systems can encode their transmissions so that the signal is nulled 
(i.e., sums up to zero) at a particular receive antenna. MIMO systems use this capability to eliminate interference to 
unwanted receivers. In contrast, we use nulling to eliminate reflections from static objects, including the wall. 
Specifically, a Wi-Vi device has two transmit antennas and a single receive antenna. Wi-Vi operates in two stages. In 
the first stage, it measures the channels from each of its two transmit antennas to it’s receive antenna. 

In stage 2, the two transmit antennas use the channel measurements from stage 1 to null the signal at the receive 
antenna. Since wireless signals (including reflections) combine linearly over the medium, only reflections off objects 
that move between the two stages are captured in stage 2. Reflections off static objects, including the wall, are nulled 
in this stage. We refine this basic idea by introducing iterative nulling, which allows us to eliminate residual flash and 
the weaker reflections from static objects behind the wall. Second, how does Wi-Vi track moving objects without an 
antenna array? To address this challenge, we borrow a technique called inverse synthetic aperture radar (ISAR), which 
has been used for mapping the surfaces of the Earth and other planets. ISAR uses the movement of the target to 
emulate an antenna array. As shown in Figure 1, a device using an antenna array would capture a target from spatially 
spaced antennas and process this information to identify the direction of the target with respect to the array (i.e., !). 
In contrast, in ISAR, there is only one receive antenna; hence, at any point in time, we capture a single measurement. 
Nevertheless, since the target is moving, consecutive measurements in time emulate an inverse antenna array — i.e., 
it is as if the moving human is imaging the Wi-Vi device. By processing such consecutive measurements using 
standard antenna array beam steering, Wi-Vi can identify the spatial direction of the human. In context 5.2 we extend 
this method to multiple moving targets. 

Additionally, Wi-Vi leverages its ability to track motion to enable a through-wall gesture-based communication 
channel. Specifically, a human can communicate messages to a Wi-Vi receiver via gestures without carrying any 
wireless device. We have picked two simple body gestures to refer to “O” and “1” bits. A human behind a wall may 
use a short sequence of these gestures to send a message to Wi Vi. After applying a matched filter, the message signal 
looks similar to standard BPSK encoding (a positive signal for a “1” bit, and a negative signal for a “O” bit) and can be 
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decoded by considering the sign of the 
signal. The system enables law enforcement 
Direction of motion personnel to communicate with their team 
across a wall, even if their communication 
devices are confiscated. 

In (a), an antenna array is able to locate 
an object by steering its beam spatially. In 
(b), the moving object itself emulates an 
antenna array; hence, it acts as an inverse 
synthetic aperture. Wi-Vi leverages this 
principle in order to beamform the received 
signal in time (rather than in space) and 


2. RELATED WORK 


Wi-Vi is related to past work in three major 
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areas: Through-wall radar. Interest in 
through-wall imaging has been surging for about a decade. Earlier work in this domain focused on simulations and 
modeling. Recently, there have been some implementations tested with moving humans. These past systems 
eliminate the flash effect by isolating the signal reflected off the wall from signals reflected off objects behind the 
wall. This isolation can be achieved in the time domain, by using very short pulses (less than 1ns) whereby the pulse 
reflected off the wall arrives earlier in time than that reflected off moving objects behind it. Alternatively, it may be 
achieved in the frequency domain by using a linear frequency chirp given by G. Charvat, L. Kempel, E. Rothwell, C. 
Coleman, and E. Mokole et al (2010). In this case, reflections off objects at different distances arrive with different 
tones. By analog filtering the tone that corresponds to the wall, one may remove the flash effect. These techniques 
require ultra-wide bandwidths (UWB) of the order of 2 GHz [11, 40]. Similarly, through-wall imaging products 
developed by the industry Radar Vision Time Domain Corporation. Hinge on the same radar principles, requiring 
multiple GHz of bandwidth and hence are targeted solely at the military. As a through-wall imaging technology, Wi-Vi 
differs from all the above systems in that it requires only few MHz of bandwidth and operates in the same range as 
Wi-Fi. It overcomes the need for UWB by leveraging MIMO nulling to remove the flash effect. Researchers have 
recognized the limitations of UWB systems and explored the potential of using narrowband radars for through wall 
caused by moving objects behind the wall. However, the flash effect limits their detection capabilities. Hence, most of 
these systems are demonstrated either in simulation or in free space with no obstruction. 

The ones demonstrated with an obstruction use a low-attenuation standing wall, and do not work across higher 
attenuation materials such as solid wood or concrete. Wi-Vi shares the objectives of these devices; however, it 
introduces a new approach for eliminating the flash effect without wideband transmission. This enables it to work 
with concrete walls and solid wood doors, as well as fully closed rooms. The only attempt which we are aware of that 
uses Wi-Fi signals in order to see through walls was made in 2012. This system required both the transmitter and a 
reference receiver to be inside the imaged room. Furthermore, the reference receiver in the room has to be 
connected to the same clock as the receiver outside the room. In contrast, Wi-Vi can perform through-wall imaging 
without access to any device on the other side of the wall. 

Gesture-based interfaces: Today, commercial gesture-recognition systems — such as the Xbox Kinect, 
NintendoWii, etc. — can identify a wide variety of gestures. The academic community has also developed systems 
capable of identifying human gestures either by employing cameras or by placing sensors on the human body. Recent 
work has also leveraged narrowband signals in the 2.4 GHz range to identify human activities in line-of-sight using 
micro-Doppler signatures. Wi-Vi, however, presents the first gesture-based interface that works in non-line-of-sight 
scenarios, and even through a wall, yet does not require the human to carry a wireless device or wear a set of 
sensors, 

Infrared and thermal imaging: Similar to Wi-Vi, these technologies extend human vision beyond the visible 
electromagnetic range, allowing us to detect objects in the dark or in smoke. They operate by capturing infrared or 
thermal energy reflected off the first obstacle in line-of-sight of their sensors. However, cameras based on these 
technologies cannot see through walls because they have very short wavelengths (few um to sub-mm), unlike Wi-Vi 
which employs signals whose wavelengths are 12.5 cm3. 


3. WI-VI OVERVIEW 


Wi-Vi is a wireless device that captures moving objects behind a wall. It leverages the ubiquity of Wi-Fi chipsets to 
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Table 1 design incorporates two main components: 
One-Way RF Attenuation in Common Building Materials at 1) The first component eliminates the flash reflected off 
2.4 GHz [1] the wall by performing MIMO nulling; 


2) The second component tracks a moving object by 
treating the object itself as an antenna array using a technique called inverse SAR. Wi-Vi can be used in one of two 
modes, depending on the user’s choice. In mode 1, it can be used to image moving objects behind a wall and track 
them. 

In mode 2, on the other hand, Wi-Vi functions as a gesture-based interface from behind a wall that enables humans 
to compose messages and send them to the Wi-Vi receiver. In sections 4-6, we describe Wi-Vi’s operation in detail. 


4. ELIMINATING THE FLASH 

In any through-wall system, the signal reflected off the wall, i.e., the flash, is much stronger than any signal reflected 
from objects behind the wall. This is due to the significant attenuation which electromagnetic signals suffer when 
penetrating dense obstacles. Table 1 shows a few examples of the one-way attenuation experienced by Wi-Fi signals 
in common construction materials (based on [1]). For example, a one-way traversal of a standard hollow wall or a 
concrete wall can reduce Wi-Fi signal power by 9 dB and 18 GB respectively. Since through-wall systems require 
traversing the obstacle twice, the one-way attenuation doubles, leading to an 18- 36 dB flash effect in typical indoor 
scenarios. This problem is exacerbated by two other parameters: First, the actual reflected signal is significantly 
weaker since it depends both on the reflection coefficient as well as the cross-section of the object. The wall is 
typically much larger than the objects of interest, and has a higher reflection coefficient [11]. Second, in addition to 
the direct flash caused by reflections off the wall, through-wall systems have to eliminate the direct signal from the 
transmit to the receive antenna, which is significantly larger than the reflections of interest. Wi-Vi uses interference 
nulling to cancel both the wall reflections and the direct signal from the transmit to the receive antenna, hence 
increasing its sensitivity to the reflections of interest. 


4.1. Nulling To Remove the Flash 

Recent advances show that MIMO systems can pre-code their transmissions such that the signal received at a 
particular antenna is cancelled. Past work on MIMO has used this property to enable concurrent transmissions and 
null interference. We observe that the same technique can be tailored to eliminate the flash effect as well as the 
direct signal from the transmit to the receive antenna, thereby enabling Wi-Vi to capture the reflections from objects 
of interest with minimal interference. At a high level, Wi-Vi’s nulling procedure can be divided into three phases: 
initial nulling, power boosting, and iterative nulling, as shown in Alg. 1. Initial Nulling. In this phase, Wi-Vi performs 
standard MIMO nulling. Recall that Wi-Vi has two transmits antennas and one receive antenna. First, the device 
transmits a known preamble x only on its first transmit antenna. This preamble is received at the receive antenna as y 
= hix, where h, is the channel between the first transmit antenna and the receive antenna. The receiver uses this 
signal in order to compute an estimate of the channel hi. Second, the device transmits the same preamble x, this time 
only on its second antenna, and uses the received signal to estimate channel h2 between the second transmit 
antenna and the receive antenna. Third, Wi-Vi uses these channel estimates to compute the ratio p = = hi/ “ho. 
Finally, the two transmit antennas transmit concurrently, where the first antenna transmits x and the second 
transmits px. Therefore, the perceived channel at the receiver is 

hres = hy + Io z.! =O (1) 
hn 
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Algorithm 1 Pseudocode for Wi-Vi's Nulling 
INITIAL NULLING: 
> Channel Estimation 
Tx ant. 1 sends x; Rx receives y; hy + y/x 
Tx ant. 2 sends x; Rx receives y; hz + y/x 
> Pre-coding: p <— —hy/h 
POWER BOOSTING: 
Tx antennas boost power 
Tx ant. 1 transmits x, Tx ant. 2 transmits px concurrently 
ITERATIVE NULLING: 
i+0 
repeat 
Rx receives y; hires + y/x 
ifieventhen _ 
hy — Ars + hy 
else A 
ho + (1 — F) hp 
p+ —h,/h 
Tx antennas transmit concurrently 
iti+l 
until Converges 


In the ideal case, where the estimates hi°and * hz are perfect, the received signal hres would be equal to zero. Hence, 
by the end of this phase Wi-Vi has eliminated the signals reflected off all static objects as well as the direct signal from 
the transmit antennas to the receive antenna. If no object moves, the channel will continue being nulled. However, 
since RF reflections combine linearly over the medium, if some object moves, its reflections will start showing up in 
the channel value. 

Power Boosting: Simply nulling static reflections, however, is not enough because the signals due to moving objects 
behind the wall are too weak. Say, for example, the flash effect was 30 to 40 dB above the power of reflections off 
moving objects. Even though we removed the flash effect, we can hardly discern the signal due to moving objects 
since it will be immersed in the receiver’s hardware noise. Thus, we next boost the transmitted signal power.5 Note 
that because the channel has already been nulled, i.e., hres == 0; this increase in power does not saturate the 
receiver’s ADC. However, it increases the overall power that traverses the wall, and, hence, improves the SNR of the 
signal due to the objects behind the wall. 

Iterative Nulling: After boosting the transmit power, residual reflections which were below the ADC quantization 
level become measurable. Such reflections from static objects can create significant clutter in the tracking process if 
not removed. To address this issue, Wi-Vi performs a procedure called iterative nulling. At a high level, the objective is 
simple: we need to null the signal again after boosting the power to eliminate the residual reflections from static 
objects. The challenge, however, is that at this stage, we cannot separately estimate the channels from each of the 
two transmit antennas since, after nulling, we only receive a combined channel. We also cannot remove the nulling 
and re-estimate the channels, because after boosting the power, without nulling, the ADC would saturate. 


5. IDENTIFYING AND TRACKING HUMANS 

Now that we have eliminated the impact of static objects in the environment, we can focus on tracking moving 
objects. We will refer to moving objects as humans since they are the primary subjects of interest for our application; 
however, our system is general, and can capture other moving bodies. Below, we first explain how Wi-Vi tracks the 
motion of a single human. We then show how to extend our approach to track multiple moving humans. 


5.1. Tracking a Single Human 

Most prior through-wall systems track human motion using an antenna array. They steer the array’s beam to 
determine the direction of maximum energy. This direction corresponds to the signal’s spatial angle of arrival. By 
tracking that angle in time, they infer how the object moves in space. Wi-Vi, however, avoids using an antenna array 
for two reasons: First, in order to obtain a narrow beam and hence achieve a good resolution, one needs a large 
antenna array with many antenna elements. This would result in a bulky and expensive device. Second, since Wi-Vi 
eliminates the flash effect using MIMO nulling, adding multiple receive antennas would require nulling the signal at 
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In this section, we show how Wi-Vi extends its tracking 
procedure to multiple humans. Our previous discussion about 
using human motion to emulate an antenna array still holds. 
However, each human will emulate a separate antenna array. 
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Time samples as Antenna Arrays Since Wi-Vi has a single antenna, the received signal will be a 
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superposition of the antenna arrays of the moving humans. In 
particular, instead of having one curved line as in Figure 3, at any 
time, there will be as many curved lines as moving humans at 
that point in time. However, with multiple humans, the noise 
increases significantly. On one hand, each human is not just one 
object because of different body parts moving in a loosely 
coupled way. On the other hand, the signal reflected off all of 
these humans is correlated in time, since they all reflect the 
transmitted signal. The lack of independence between the 
reflected signals is important. For example, the reflections of two 
humans may combine systematically to dim each other over 
some period of time. 


6. THROUGH-WALL GESTURE-BASED 
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For a human to transmit a message to a computer wirelessly, she 
typically has to carry a wireless device. In contrast, Wi-Vi can enable a human who does not carry any wireless device 
to communicate commands or short messages to a receiver using simple gestures. Wi-Vi designates a pair of gestures 
as a ‘0’ bit and a ‘1’ bit. A human can compose these gestures to create messages that have different interpretations. 
Additionally, Wi-Vi can evolve by borrowing other existing principles and practices from today’s communication 
systems, such as adding a simple code to ensure reliability, or reserving a certain pattern of ‘0’s and ‘1’s for packet 
preambles. At this stage, Wi-Vi’s interface is still very basic, yet we believe that future advances in through-wall 
technology can render this interface more expressive. Below, we describe the gesture-based communication channel 
that we implemented with Wi-Vi. 


6.1. Gesture Encoding 

At the transmitter side, the ‘0’ and ‘1’ bits must be encoded using some modulation scheme. Wi-Vi implements this 
encoding using gesture. One can envision a wide variety of gestures to represent these bits. However, in choosing our 
encoding we have imposed three conditions: 1) the gestures must be composable — i.e. at the end of each bit, 
whether ‘0’ or ‘1’, the human should be back in the same initial state as the start of the gesture. This enables the 
person to compose multiple such gestures to send a longer message. 2) The gestures must be simple so that a human 
finds it easy to perform them and compose them. 3) The gestures should be easy to detect and decode without 
requiring sophisticated decoders, such as machine learning classifiers. 

Given the above constraints, we have selected the following gestures to modulate the bits: a ‘0’ bit is a step 
forward followed by a step backward; a ‘1’ bit is a step backward followed by a step forward. This modulation is 
similar to Manchester encoding, where a ‘0’ bit is represented by a falling edge of the clock, (i.e., an increase in the 
signal value followed by a decrease,) and a ‘1’ bit is represented by a rising edge of the clock, (i.e., a reduction in 
signal value followed by an increase). 

Figure 4 shows the signal captured by Wi-Vi, at the output of the smoothed MUSIC algorithm for each of these 
two gestures. Taking a step forward towards the Wi-Vi device produces a positive angle, whereas taking a step 
backward produces a negative angle. The exact values of the produced angles depend on whether the human is 
exactly oriented towards the device. Recall that the angle is between the vector orthogonal to the motion and the 
line connecting the human to the Wi-Vi device, and its sign is positive when the human 
is moving toward Wi-Vi and negative when the human moves away from Wi-Vi. 
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6.2. Gesture Decoding 

Decoding the above gestures is fairly simple and follows standard communication techniques. Specifically, Wi-Vi’s 
decoder takes as input A![!, n]. Similar to a standard decoder [16], Wi-Vi applies a matched filter on this signal. 
However, since each bit is a combination of two steps, forward and backward, Wi-Vi applies two matched filters: one 
for the step forward and one for the step backward. Because of the structure of the signal shown in Figure 4, the two 
matched filters are simply a triangle above the zero line, and an inverted triangle below the zero line. Wi-Vi applies 
these filters separately on the received signal, then adds up their output. 


7. CONCLUSIONS 

We present Wi-Vi, a wireless technology that uses Wi-Fi signals to detect moving humans behind walls and in closed 
rooms. In contrast to previous systems, which are targeted for the military, Wi-Vi enables small cheap see-through- 
wall devices that operate in the ISM band, rendering them feasible to the general public. Wi-Vi also establishes a 
communication channel between itself and a human behind a wall, allowing him/her to communicate directly with 
Wi-Vi without carrying any transmitting device. We believe that Wi-Vi is an instance of a broader set of functionality 
that future wireless networks will provide. Future Wi-Fi networks will likely expand beyond communications and 
deliver services such as indoor localization, sensing, and control. Wi-Vi demonstrates an advanced form of Wi-Fi- 
based sensing and localization by using Wi-Fi to track humans behind wall, even when they do not carry a wireless 
device. It also raises issues of importance to the networking community pertinent to user privacy and regulations 
concerning the use of Wi-Fi signals. Finally, Wi-Vi bridges state-of-the-art networking techniques with human- 
computer interaction. It motivates a new form of user interfaces which rely solely on using the reflections of a 
transmitted RF signal to identify human gestures. We envision that by leveraging finer nulling techniques and 
employing better hardware, the system can evolve to seeing humans through denser building material and with a 
longer range. These improvements will further allow Wi-Vi to capture higher quality images enabling the gesture- 
based interface to become more expressive hence promising new directions for virtual reality. 
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